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I. INTRODUCTION 

Emergent behaviors in a complex system depend cru- 
cially on the pattern of interactions between its compo- 
nents [1-3]. For example, we observe a cascade when 
local interactions in the vicinity of an initially isolated 
effect allow that effect to propagate globally [4, 5]. The 
network substrate of a system represents this pattern in 
its most abstract and analytically tractable form. This 
information can be used to construct network models, 
which provide theoretical insights into the causes of such 
behaviors. A fundamental problem for the construc- 
tion of these models is the determination of precisely 
which structural features are requisite to explain the phe- 
nomenon in question and which others are superfluous. 

In the configuration model [6, 7] an ensemble of ran- 
dom graphs is prescribed by a degree distribution pk . In 
each realization drawn from this ensemble, a randomly 
selected vertex will have k incident edges with proba- 
bility pk- This distribution represents the first order of 
complexity for most network models. From this, a more 
realistic model can be constructed by including degree- 
degree correlations [8-11] and/or various forms of clus- 
tering [12-15], both of which are explicitly absent from 
the configuration model. Recently, the study of multiplex 
networks has introduced a further degree of complexity 
to this general approach [16-18]. These networks consist 
of connected layers of networks, where each layer involves 
interactions of a fundamentally unique kind. 

In this paper we focus on random graphs with cluster- 
ing; specifically, those defined by Gleeson in [15]. Real 
networks typically contain a large number of short cycles 
in which a small set of vertices maintain a closed loop of 
connections. One way to measure the propensity for a 
vertex to form these types of bonds is through the local 
clustering coefficient, which is defined as the fraction of 
pairs of neighbors of a vertex that are also neighbors of 
each other [19]. The degree-dependent clustering coef- 
ficient or clustering spectrum Ck is found by averaging 
the local clustering coefficient over the class of vertices 
of degree k [20, 21]. A global measure of clustering C2 
can be defined by averaging the local coefficients of all 
N vertices in the graph. Gleeson [15] has shown how the 



configuration model can be modified to generate ensem- 
bles of highly clustered graphs (see also [12-14]). This is 
achieved by embedding cliques of connected vertices into 
an otherwise treelike structure. Each ensemble is pre- 
scribed by the joint distribution j(k,c): the probability 
that in any realization a randomly selected vertex has 
degree k and is in a clique of c vertices (a c-clique). 

Our aim is to provide a generalized analytical approach 
to determining the expected cascade size on these 7(fc, c) 
or clique-based graphs. This goes far beyond the bond 
percolation process studied in [15] to include a broad 
class of cascade processes including Watts's threshold 
model [4], fc-core decomposition [22, 23], and both site 
and bond percolation [24, 25]. Also of relevance is our 
earlier work [26] on cascades on edge-triangle graphs 
[13, 14]. Edge-triangle graphs are created by embedding 
3-cliques, and only 3-cliques, into an otherwise treelike 
structure. In each such graph a randomly chosen vertex 
is incident to s single edges and 2t triangle edges with 
probability p(s,t). In contrast, a j(k, c) graph can con- 
tain cliques of many different sizes and may therefore 
have local clustering levels that are much higher than 
those in edge-triangle graphs. Furthermore, "f(k, c) can 
be parametrized to match the empirical clustering spec- 
trum Cfc and degree distribution pk of a real-world net- 
work [15]. This additional complexity means that a very 
different analytical approach from that of [26] is required 
here. Our approach thus provides another significant ex- 
tension of the methods used by Gleeson and Cahalane 
[27] and Gleeson [5], who provided analytical results for 
cascades on configuration-model graphs by introducing a 
tree-based framework of level-by-level vertex activations. 
This method was inspired by methods originally devel- 
oped to study the zero-temperature random-field Ising 
model on a Bethe lattice [28-30]. 

The class of cascade dynamics examinable through the 
tree-based framework consists of those processes that sat- 
isfy the following list of properties: (i) each vertex is 
assigned a binary value specifying its current state, ac- 
tive (damaged or infected) or inactive (undamaged or 
susceptible) ; (ii) the probability of a vertex becoming ac- 
tive (in a synchronous update of all vertices) depends 
only on its degree k and the number m of its neigh- 
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bors that are already active and is termed the neigh- 
borhood influence response function [31, 32]; (hi) for 
any fixed degree k, F^ is a nondecreasing function of m; 
and (iv) once active, a vertex cannot become deactivated 
[33]. Each of the processes referred to in the preced- 
ing paragraph satisfies these constraints and is defined 
by choosing an appropriate F^, as detailed in [5]. The 
goal of our analytical approach is the prediction of the 
expected size of the cascade when a time-dependent pro- 
cess of the type described here has run to completion. 
Our analytical results are defined as the fixed point of 
an iterative process, i.e., the solution of a self-consistent 
system of equations, but the level-by-level activation ap- 
proach used in our analysis should not be misunderstood 
as a time-dependent process in its own right; rather it 
is a convenient representation of the iteration scheme for 
solving for the steady-state solution. 

The remainder of this paper is structured as follows. 
In Sec. II we describe in broad outline our generalized ap- 
proach to cascade dynamics on clique-based graphs. As 
well as an analytical expression for the expected cascade 
size, we provide a first-order condition for the existence 
of cascades whose size scales with the number of vertices 
N as N — > oo. Section III deals in greater detail with 
the particulars of clique member activations. We show 
how to calculate in closed form the number of active ver- 
tices in a clique of any size c < k + 1. The analysis of 
both sections is described in terms of an arbitrary re- 
sponse function. The particular forms that this response 
takes for various processes are discussed in Sec. IV, where 
we demonstrate the correspondence between our analyti- 
cal results and numerical simulations of bond percolation 
and Watts's model. In Sec. V we present a possible ex- 
tension of Watts's model in which different weights are 
assigned to active clique neighbors and active nonclique 
neighbors. This allows us to vary the influence of a ver- 
tex's neighbors on its probability of activation between 
these two subgroups. We suggest that in future analy- 
ses this may provide important insights into the role of 
group structure and peer influence in processes of social 
contagion, such as opinion formation [34, 35]. 



II. CASCADE ANALYSIS 

As was also the case for edge-triangle graphs [26], in 
order to extend to clique-based graphs the approach of [5] 
we must first reconcile the presence of clustering with the 
locally treelike approximation on which that approach is 
founded. In considering how best to proceed, let us re- 
turn briefly to [15] and remind ourselves of the structural 
properties of the "f{k, c) ensemble. 

In Fig. 1 we have reproduced Fig. 2 of [15]. This fig- 
ure shows a portion of an arbitrary j(k, c) graph that has 
been reconfigured into a treelike formation. The essential 
characteristics of this reconfiguration can be explained 
most succinctly by looking at the local edge topology of 
the randomly chosen vertex A. This vertex, positioned 




FIG. 1. (Color online) Level-by-level cascade propagation in 
a 7(fe, c) graph using the tree approximation. External edges 
emphasized. 



on level n+ 1 of the tree, has degree k — 6 and is a mem- 
ber of a 4-clique. Its six incident edges are made up of 
c — 1 = 3 internal edges, which connect A to its neigh- 
boring clique members, and k — c + 1 = 3 external edges 
(emphasized). Of these external edges, one connects A to 
its parent vertex on the next level up, while the remain- 
ing k — c — 2 connect A to its external children on level 
n. The clique neighbors are positioned on an unlabelled 
intermediate level between A and its grandchildren (cir- 
cled with dashed line) on level n. This categorization and 
positioning of vertices is representative of how the tree- 
based framework operates throughout the graph. Note 
that any vertex may be treated similarly to A regardless 
of the size of the clique to which it belongs. For any 
(k, c) pairing such that k > c — 1 (see [15]), c — 1 clique 
neighbors can always be made to reside in the interspace 
between a vertex and the level below and one may also 
stipulate in general that at most one external edge leads 
to the parent above. In extreme vertex with 

no internal edges is simply a member of a 1-clique and 
therefore all of its connections will pass directly from one 
level to the next (c — 1 = 0), as in [27]. A vertex with no 
external edges must reside either at the root of the tree 
and have no parent if it is part of a clique or it must be 
entirely isolated and have zero connections in total. 

This, then, was the key that allowed Gleeson to cal- 
culate the giant connected component (GCC) size S in 
bond percolation on j(k, c) graphs. Equation (5) of [15] 
was used to determine the conditional probability that a 
vertex like A is active (part of the GCC) on each level of 
the tree and Eq. (6) of [15] then gave S as the probability 
of activation of the root vertex by using the steady-state 
value from Eq. (5) . The restriction of this theory to bond 
percolation arises primarily from its reliance on a set of 
polynomials that were defined and tabulated by New- 
man in [36]. Crucially, however, those polynomials play 
no role in the conceptualization described above. Thus 
our task of extending the theory of [5] amounts to tak- 
ing this framework and introducing the response function 
mechanism. Since we shall not apply the polynomials of 
[36], a straightforward substitution of F^ will not suffice. 
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In fact, as we will now show, our approach requires a set 
of equations entirely different from those of [15]. 

A. Expected cascade size 

With the theoretical foundations in place, we can begin 
to derive generalized analytical expressions for cascades 
on 7(fc, c) graphs. We proceed in the familiar manner by 
considering the probability q n +i that the randomly se- 
lected vertex A in Fig. 1 is active, conditional on its par- 
ent vertex being inactive. As is usual for the tree-based 
approach, we stipulate that the vertex A can become ac- 
tive only due to the influence of the states of the neigh- 
boring vertices directly below it in the tree. In this case, 
however, A has two different types of neighbors: It has 
k — c external children on level n and c — 1 clique neigh- 
bors on the intermediate level. Significantly, the ways in 
which these two types of neighbor can become active in 
their own right are quite distinct from each other. Thus 
their contributions to the probability of activation of A 
must be calculated separately. This is the first problem 
to be addressed. 

Starting with the simpler of the two contributions, let 
us write down the probability that an arbitrary number, 
call it j, of A's external neighbors are active. Since there 
is no clustering between these vertices, each one is inde- 
pendently activated by its own children on level n — 1 
with probability q n . Therefore, the probability that a 
total of j out of k — c external neighbors are activated 
in this way is given simply by the binomial probability 
mass function (PMF) 

B k f c {q n ) = ( fc T C )<Z„ J (1 - q n ) k - c - j . (1) 

For the second contribution to A, matters are made 
considerably more complicated by the fact that its c — 1 
clique neighbors arc fully connected. This means that the 
probability that each of these clique neighbors is active 
depends not only on the states of their children — the four 
grandchildren of A on level n — but also on the states of 
one another. Recall from the derivation of our theory for 
cascades on p(s, t) graphs in [26] that we had to account 
for the fact that each vertex at the base of a triangle can 
directly influence the state of the other. We are faced 
with a similar problem here; however, since we are now 
dealing with "f(k, c) graphs we have a whole spectrum of 
clique sizes to contend with. 

One can appreciate how much more intricate this will 
make our calculations by imagining that A were part of a 
very large clique [as it could be, depending on our choice 
of j(k,c)]. For example, if A were in a 10-clique, then 
c — 1 = 9 intermediate vertices would each have a role 
to play in determining each others' states. The solution 
in this case would require an extensive list of combina- 
torial expressions similar to, but extending far beyond, 
Eqs. (5)-(8) of [26]. Ideally, we would like to avoid tabu- 
lating combinatorial terms altogether and instead have 



a single compact analytical expression that is flexible 
enough to deal with any clique size. This expression 
would allow us to feed in the total number of clique neigh- 
bors as a variable and would then return the probability 
that a certain fraction of them are active. Evidently, the 
derivation of such an expression is not straightforward. 
We shall therefore postpone this task until later in our 
presentation. 

In the meantime, we continue our analysis of cascade 
propagation by simply providing the name of this func- 
tion and taking it for granted that later in Sec. Ill we will 
define precisely how it operates. Let us call the relevant 
function R^ 1 (q n ) and in doing so refer to it as the prob- 
ability that in a clique of c — 1 intermediate vertices a 
total of m are active, conditional on the top vertex of the 
c-clique to which they belong (vertex A in Fig. 1) being 
inactive. The dependence on q n arises from the fact that 
each intermediate vertex has its own set of children on 
level n and each of those children (A's grandchildren in 
Fig. 1) is active with probability q n . Summing over all 

possible values of m gives J2m=o Rfn 1 ^) — 1- 

If we accept the meaning of the label R'^ 1 {q n ) and 
combine it with Eq. (1) above, we now have the necessary 
terms in which to express the contribution of A's exter- 
nal children and clique neighbors towards its probability 
of activation q n +i- This takes us very close to defining 
an iterative equation for q n +i in terms of q n . The miss- 
ing ingredient is the probability c) that the random 
vertex A, while having degree k and being a member of 
a c-clique, is also the child of a random vertex on level 
n + 2. This probability plays a role similar to that of the 
term (k/z)pk in Eq. (1) of [5], which gives the probabil- 
ity of reaching a child of degree k by traveling along a 
randomly chosen edge from its parent in a nonclustered 
graph (see [37]). Similarly, here ((k,c) closes our itera- 
tion by allowing us to average over all vertices on level 
n + 1 in the correct manner. We express this probability 
as 

C(k,c) = (k-c+l)j(k,c)/z e , (2) 

where z e — c {k — c+l)j(k,c) is the average number 
of external edges per vertex. 

Combining all three of our ingredients, we can now 
write our generalized iterative equation in terms of an 
arbitrary response function F^^j as 

?n+l = Po + (1 - Po) J2 C )*(<^ k ~ !)' ( 3 ) 

fc,C 

where 

x— c+1 c— 1 

*(g n> x)= ]T Y. B T C+1 ^n)R^\<ln)Fi +r (4) 

j=Q m=0 

Thus we have derived an analytical expression for the 
probability that a randomly chosen vertex on the next 
level up, generically called n+1, is active, conditional on 
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its parent being inactive. Referring once again to Fig. 1, 
Eq. (3) tells us that the vertex A will be found active 
if it was initially activated as part of the seed fraction 
po or (with probability 1 — po) if it subsequently became 
active in response to the states of the x = k — 1 neigh- 
bors directly below it in the tree. For the latter, Eq. (4) 
indicates that there are two distinct contributions from 
two different sets of neighbors: one from the external 
children of A and the other from the intermediate clique 
members. A total of j of the first type of neighbor are ac- 
tive with probability B^ c (q n ) and m of the second type 
with probability R^ 1 {q n ). Whether the sum of j and m 
is sufficient to activate A is determined by the response 
function F^ + j. 

In the usual manner, iterating Eq. (3) to the steady 
state will give us g^. This value can then be used in 
the following expression to determine the probability of 
activation of the root vertex: 



Po 



(1 -Po)^7(fc,c)*(q 00 ,fc). 



(5) 



k.c 



The probability p is equivalent to the expected cascade 
size (see the discussion in [26]). The differences between 
this equation and Eq. (3) above are attributable to the 
fact that the root vertex has no parent. This means that 
all of the root's k edges extend downwards to its children, 
hence ^(qoo,k). It also means that the correct term for 
averaging is simply 7(A), c). 

Taken together, then, Eqs. (3)-(5) constitute the core 
of our present analytical approach. We can use these 
equations to investigate various different cascade pro- 
cesses by applying the appropriate definition of the re- 
sponse function Fj^ + j in each case. In Sec. IV we will 
provide the definitions of F^ l+ j for bond percolation and 
Watts's model. Before that, we must also define the func- 
tion R^ 1 {q n ). This task will occupy all of Sec. III. Next, 
let us conclude Sec. II by deriving a general first-order 
cascade condition. 



B. Cascade condition 

The cascade condition determines whether an infinites- 
imally small seed fraction po of active vertices will gener- 
ate a nonvanishing mean cascade size as the total number 
of vertices in the graph diverges (N — > 00). For this to 
happen the iteration of Eq. (3) must cause the activation 
probability q n to grow from an initial value go = to a 
nonzero steady-state qoo [5]. If we regard Eq. (3) (with 
po = 0) as a nonlinear function of q of the general form 
q n +i = H(q n ), then this last condition can be expressed, 
to first-order, as H'(0) > 1. 

To evaluate H'(0) we require the following results for 
the binomial PMF of Eq. (1): 



dq B i 



k — c 



(?) 



(k - c)(S jA - S jfi ). 



(7) 



q=0 



Using Eqs. (6) and (7) in Eq. (3), we find that the first 
derivative of H{q) 1 evaluated at q = 0, may be expressed 
as 



h'(o) = £c(M£ 



F 



xR^\Q) + Ft-Rl-\q) 



(8) 



B-- c m = s jl0 , 



(6) 



This is the left-hand side of our cascade condition. Note 
that because this expression depends on i?J^" 1 (0) and the 
first derivative of R c j^ 1 {q) at q = it becomes an increas- 
ingly arduous task to calculate H'(0) from Eq. (8) as the 
size of the largest clique in our graph increases. As we 
shall see in the next section, the evaluation of the func- 
tion R'^ 1 (q) becomes increasingly difficult as the value 
of c increases. For this reason, in our analysis in Sec. IV 
we will choose j(k, c) such that the cliques in our graphs 
are constrained to sizes of c < 4. In addition, we shall 
make the simplifying assumption that Fq = (see [26]). 
This implies that a vertex will never activate if none of 
its neighbors are active and is a suitable approximation 
for the calculation of our first-order condition. 



III. ACTIVE CLIQUE NEIGHBORS 

Backtracking slightly in the flow of our presentation, 
we will now derive a concise closed-form expression for 
the probability labelled above as i?^ 1 (<?„). Let us begin 
by recapitulating the meaning of this label. According 
to our earlier definition, it is the probability that m out 
of c — 1 intermediate level c-clique vertices are active, 
given that their own externally linked children are each 
independently active with probability q n and that the 
parent vertex at the top of the c-clique is inactive. In 
Fig. 1, for example, R^ n {q n ) is the probability that m of 
the vertex A's three clique neighbors are active, given 
that each of the four grandchildren of A (circled) has an 
activation probability of q n and that A is itself inactive. 

In considering how to calculate R c ^ 1 {q n ) in general, 
we see immediately that it is not the states of the exter- 
nal grandchildren that will cause us difficulty, but rather 
the fact that the state of each intermediate clique mem- 
ber can influence the states of all other members. In 
our framework, every c-clique has one of its (internally 
linked) members designated as the parent and placed on 
level n + 1. This leaves each of the remaining c — 1 clique 
members on the intermediate level with k — c+1 external 
edges to connect to its own children on level n. The prob- 
ability that some number j of these children are active is 
given by the binomial PMF Bj ~ c+1 (q n ). Thus the prob- 
ability that an intermediate clique member is activated 
by its children is quite easy to calculate. In contrast, in 
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order to deal with the influence of the c — 1 clique mem- 
bers on one another, we will have to consider carefully 
the various combinations of states that may exist within 
the intermediate portion of the clique. 

Our first step in tackling this problem is to provide 
a mechanism for the intermediate clique members to be 
activated, which combines both internal and external in- 
fluences. We define 

gt 2 m = £ ^ £ + V c+1 ^)^ w 

k Pc ]=n 

for c > 2, as the conditional probability that an inter- 
mediate c-clique vertex will be activated if d of its c — 2 
clique neighbors on the same level are active, given its 
external children are each active with probability q n and 
its parent on level n + 1 is inactive. The term j(k, c)/p c 
is the degree distribution of vertices that belong to a c- 
clique, where p c = ^2kl{k, c). The response function 
F*[ + j will determine whether d active neighbors plus j 
active children are enough to cause activation. Defined 
as such, G c ( f 2 (q n ) provides a fundamental term in which 
to express the various possible active configurations, thus 
permitting us to begin the procedure of counting. 

We consider first the simplest nontrivial case, namely, 
c = 3. Suppose we pick from some arbitrary "f(k, c) graph 
a vertex with degree k that is also a member of a 3-clique. 
If we let this vertex reside on level n + 1 of the tree and 
also position its c — 1 = 2 clique neighbors between level 
n + 1 and level n below, our task then is to calculate 
Rmiln)- To do this, let us refer to Fig. 2 and look at 
the possible states of these two vertices in isolation from 
their inactive parent. 

Starting with both vertices inactive — the configuration 
labelled Co in Fig. 2 — we first count the possible config- 
urations of states after one round (i = 1) of synchronous 
updates. Since we have started from cq, with both ver- 
tices inactive, the probability of either vertex becoming 
active in this first round is simply Gq. Therefore, each 
possible outcome (ci, C2, or C3 in Fig. 2) is determined by 
a binomial PMF with probability of success Gq. Configu- 
ration ci, in which both vertices have remained inactive, 
will occur with probability Bq(Gq). Similarly, configura- 
tion C2, in which one vertex has been activated and the 
other has remained inactive, will occur with probability 
Bi(Gq). Finally, configuration C3, in which both vertices 
have been activated, will occur with probability 13% (Gq). 
[Note that in each term G^~ 2 = G c ( f 2 (q n )] we will use 
this abbreviation throughout.] 

Having determined the three distinct outcomes of the 
first round of updates, we will now categorize each con- 
figuration into either of two types: terminal or volatile. 
In a terminal configuration no further changes of state 
are possible because all vertices have reached their own 
steady-state of either permanent activation or inactiva- 
tion. In a volatile configuration, however, there exists at 
least one inactive vertex that is liable to become active. 
Thus, as long as volatile configurations are produced we 
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FIG. 2. (Color online) Transition probabilities for a pair (c — 
1 = 2) of intermediate clique neighbors in a f(k, c) graph. 
Colour indicates vertex state: light gray, inactive; dark gray 
(green), active. 

must continue with another round of updates. The pro- 
cess of updating will reach its end when all configurations 
are terminal. Categorizing the outcomes of round one 
tells us whether or not a second round is necessary and 
also indicates which configurations need to be updated. 
Configuration c\ is clearly terminal since the transition 
from Co to c\ has established that neither vertex can ac- 
tivate while the other remains inactive. Similarly, C3 is 
also terminal for the simple reason that we do not allow 
active vertices to revert to being inactive. Configuration 
C2, however, is volatile since the transition from cq to 
C2 has shown us that one of these vertices can activate 
without the other first being active, but that the same 
is not true of this other vertex. That is to say, we know 
that the inactive vertex in C2 cannot activate without an 
active neighbor. What is not clear from C2 is whether the 
vertex that did activate in round one is now sufficient to 
activate the vertex that remained inactive in that round. 
The only way to determine this is to run a second round 
(i = 2) of updates on ci. 

As was the case in the first round, to begin the sec- 
ond round we must provide an appropriate probability 
of activation. We want to know if the active vertex in 
C2 is enough to activate the inactive vertex in C2, given 
that the inactive vertex cannot activate without an ac- 
tive neighbor. This can be decided upon by using the 
activation probability £i(0, 1) defined by the function 

c 1 u\ iln) — G° a 2 (q n ) , in . 

Equation (10) gives us the conditional probability that 
in a clique of c — 1 intermediate vertices b active ver- 
tices are enough to cause the activation of one of their 
inactive clique neighbors, given that a active vertices are 
insufficient to do so. The function £ c _2(a, b) is defined 
for0<a<fe<c — 2 and is non-negative for all such 
values since by Eq. (9) G c ( f 2 (q n ) is an increasing func- 
tion of d. This latter property is true of G c l f 2 (q n ) since 
is defined (see Sec. I) to be a nondecreasing function 
of m [and therefore so is F^ + - in Eq. (9)]. The con- 
figurations produced by updating with this probability 
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are once again given by a binomial PMF. With prob- 
ability Bq (£i(0,1)) the inactive vertex will remain in- 
active, thereby producing configuration C4. Conversely, 
with probability -Bi(£i(0, 1)) the inactive vertex will ac- 
tivate, thereby producing configuration C5. Categorizing 
C4 and C5, we find both configurations are terminal and 
therefore the process of updating may now cease. 

With all terminal configurations now achieved, the 
next step in our derivation of R^ n {q n ) is to combine the 
various transition probabilities listed in Fig. 2 and use 
them to calculate each of R^(q n ), R\{q n ), and R%(q n ). 
Tracing our way through Fig. 2, we reach a terminal state 
in which no vertices are active by following the route 
Co — > c\. Similarly, we end with one active vertex by fol- 
lowing Co — > c 2 —¥ c 4 . Finally, a terminal state with two 
active vertices is given by either of the routes Co — > C3 or 
Co C2 — ► C5. All of this information can be expressed 
succinctly using the various transition probabilities asso- 
ciated with each route if we bear in mind that a transition 
from one configuration to another, symbolized by — >, cor- 
responds to the multiplication of probabilities, and also 
that the word or corresponds to addition. To summarize, 
the set of routes described here yields the following set 
of equations: 

Rl{q n )=Bl{Gl) 1 (11) 
Rl{q n )=Bl{Gl)Bl{^{Q,l)), (12) 

R 2 2 (q n ) = BKgDbK^O, 1)) + Bi(Gl). (13) 

The final step towards our goal of writing a closed- 
form expression for R^ n {q n ) is to find a way of expressing 
Eqs. (11)— (13) as the outputs of a single function that 
has been given the inputs m = 0, 1, and 2, respectively. 
There may be a number of different ways of defining such 
a function, some of which may appear more elegant than 
others. For our own part, we can offer a particularly con- 
cise definition by introducing a new variable and consid- 
ering how the various combinations of states determined 
by Eqs. (11)— (13) can be reproduced in a parsimonious 
manner. 

Our new variable is called U . We define it as the num- 
ber of new activations in round i of synchronous updates. 
In the scheme presented above we had two rounds; there- 
fore, we define the pair I — (li, I2) as the sequence of new 
activations over both rounds. This allows us to represent 
all possible routes through the configurations of Fig. 2 
as a collection of ordered pairs. For example, I = (1,0) 
means that there is one activation in round i = 1 and 
no activations in round i = 2 and therefore corresponds 
to the route Co — > c 2 — > c 4 . Similarly, I = (1,1) corre- 
sponds to Co — > C2 — > C5. By applying this notation we 
find that the following equation will reproduce each of 
the Eqs. (11)-(13) above: 

RUln)= E S ? 1 (GSKf' I (£i(<Mi))- (14) 

h+l2=m 



Note that the summation J2i 1 +i 2 = m m Eq- (14) is taken 
over all pairs I = I2) such that lx + 1% = m, where m 
is the total number of active vertices. 

To demonstrate how Eq. (14) operates let us calculate 
Riiln) by setting m = 1. The set of all I pairs that add 
up to this value of m is I € {(0, 1), (1,0)}. Substituting 
each of these pairs in turn into the right hand side of 
Eq. (14) and then summing gives R\(q n ) = [0 + 2Gj(l — 
G})], thereby reproducing Eq. (12) above. The values 
of i?§(g„) and R^iqn) are found similarly by using the 
parameters m = and I = (0,0), and m = 2 and I € 
{(0,2), (1,1), (2,0)}, respectively. 

Thus, in Eq. (14) we have found an expression for 
RmiQn)' which, we remind ourselves once more, is the 
conditional probability that m of the two intermediate 
vertices in a 3-clique are active, given that each of their 
own children are active with probability q n , and that the 
vertex at the top of the clique is inactive. Recall, how- 
ever, that our ultimate goal is to provide a general ex- 
pression for R'^ 1 {q n ). Our approach to this problem has 
been to determine a series of expressions for increasing 
values of c and then to express each of these as special 
cases of a single unifying expression. Each individual ex- 
pression for R^ (q n ), where c > 3, can be found by a 
method similar to the one described above for R^ n (q n ). 
The core of this method is the same regardless of the 
value of c and can be summarized in general as follows. 

(i) Simultaneously update the states of all inactive ver- 
tices. 

(ii) Categorize the resulting configurations of states as 
either terminal or volatile, removing those that are 
terminal from further consideration. 

(iii) Repeat steps (i) and (ii) until no volatile configura- 
tions remain. 

Counting the terminal configurations will then provide 
the various outcomes obtainable in the steady state of 
the cascade. For example, in determining R^ n (q n ), the 
application of these three steps reveals every possible ac- 
tive configuration in a triangle of connected vertices and 
each associated transition probability. As above, follow- 
ing the different routes towards each terminal configura- 
tion indicates the correct sequence of multiplications and 
additions to employ to calculate the values of R^ n {q n ) for 
< m < 3. This procedure yields the following set of 
equations: 

Rl(q n ) = B 3 Q (G 2 ), (15) 
Rl(q n ) = B?(GM(6(0,1)), (16) 

Rl{q n ) =flf (GM(£ a (0, 2)) 

+ B!(Gg)IJ 1 (6(0 J 2)), (17) 
Rl{q n )=Bl{Gl)Bl(i 2 {^))B\{i 2 {l,2)) 
+ Bl(G 2 )+Bl(Gl)Bl(^(0,2)) 
+ Bf(G 2 Q )Bl(U0,l)). (18) 
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Continuing in the same manner as before, an expres- 
sion for Rfnfan) that contains Eqs. (15)— (18) as spe- 
cial cases can be defined by applying the variable 
and considering each unique sequence of activations I = 
(hi h)h)- By doing this we have found that the equation 

<(<?«) = E B h ( G o) Bf- h (6(0, h)) 

h+h+h=m 

xB? 3 - (h+h) (Uh,h+l2)) (19) 

will reproduce Eqs. (15)-(18). 

Observe the similarities between equation Eq. (19) and 
(14). They indicate that to create an expression for 
i?m(<Z«) from that for ii^ l (q , n ) above all one must do (be- 
sides set c = 4) is place additional indices I2 and ^3 in the 
appropriate positions and include one more multiplica- 
tive term, namely, Bf " 1+ ' 2 ' (£3(^1, h + h))- By running 
the entire scheme of categorization and route counting 
over again with c = 5 and I — (Zi, ^3? ^4)5 we have 
observed (in calculations not provided here) that a simi- 
lar relationship also holds between i2^(g n ) and Rf n (q n ). 
The pattern of similarities detected in our calculations 
strongly suggests the following form for a general expres- 
sion for i?J^(g n ), where v is an integer v > m: 

V 

|i|=m i=l 

Let us unpack this expression. First, note that the 
variable n v .i in Eq. (20) is defined as n v> i — v — Y?j=i h 
for i > 2 with n v ,i = v. Next, the variable 6 Vt i is defined 

as 9 v .i = (j2 l jL 2 i l vJ2]=\ l j) for i ^ 3 witn A = 
Gq _1 and 0„,2 = £v-i(0,h)- Finally, the term \l\ in the 
summation of Eq. (20) is defined in multi-index notation 
(see, for example, [38]) as \l\ = l\ + . . . + l v . 

By setting v — c — 1 in Eq. (20), we have the proba- 
bility R^ 1 (q n ) expressed in closed form [39]. Applying 
this definition in Eqs. (3)- (5) (see Sec. II) completes our 
analytical description of cascades on clique-based graphs 
and permits us to proceed with the task of verifying our 
approach. We will provide this verification in the next 
section by comparing predicted values of the expected 
cascade size from Eq. (5) against the results of numerical 
simulations of bond percolation and Watts's model. 

It must be noted, however, that as the size of the 
largest clique of in our graph c max increases it becomes 
more and more computationally intensive to evaluate 
R^ 1 (q n ) using Eq. (20). This is primarily because of 
the exponentially increasing number of possible combi- 
nations for the multi-index I as the number of active 
clique members to be counted m increases. It can be 
shown that the number of different choices of I that give 
nonzero contributions to the sum in Eq. (20) is 2 m ~ 1 . 



IV. SIMULATIONS 

To test the theory of the previous two sections we re- 
quire an appropriate set of definitions for the response 
function F^ + j, corresponding to the processes in our 
familiar broad class (see Sec. I). The function F, how- 
ever, is the same one that has been used throughout our 
groups' previous publications [5, 15, 26]. Gleeson began 
in [5] by writing it in its simplest generalized form: F^. 
There, it defined the probability that a fc-degree vertex 
in a locally tree-like graph may be activated by m active 
neighbors. In [26], F^ 21 gave the probability that a k- 
degree vertex in an edge-triangle graph may be activated 
by m active neighbors, where k = s + 2t. In the current 
presentation, F^ + j prescribes the probability that a k- 
degree vertex in a clique-based graph may be activated by 
m+j active neighbors, where j and m are the numbers of 
external and internal neighbors, respectively. Since F has 
not changed (only its arguments have), the same justifi- 
cations of our use of the response function mechanism as 
were given in [26] apply equally here. Therefore, similarly 
to [26] , the definitions of F^ l+ j for different processes are 
found by replacing m with m + j in the definitions of 
F^ n given in [5]. With this aspect clarified, we can be- 
gin testing our approach against numerical simulations 
of various processes. 

A. Bond percolation 

We consider first the process of uniform bond percola- 
tion. In this process each edge of the graph (external or 
internal) is deleted with probability 1 — <f>b- The quantity 
cj)l is the bond occupation probability and nondamaged 
edges are termed occupied. Replacing m with m + j in 
Eq. (6) of [5] defines F^ +j for this process: 

^ +J -=i-(i-^r +i - (21) 

Applying this definition in the respective po — > limits 
of Eqs. (3)-(5) above allows us to use these equations 
to calculate the expected GCC size S of a clique-based 
graph, which is nonzero for fa > fa. This critical value, 
fa, is known as the bond percolation threshold. 

In Fig. 3 we have plotted our calculations of S from 
Eq. (5) against the results of numerically simulated 
7(fc, c) graphs (see the caption). The parameters cho- 
sen for this figure are the same as those used in Fig. 3(a) 
of [15]. Each graph has a Poisson degree distribution 
Pk = z k e~ z /kl with mean degree z = 3. Following [15], 
we set 7(fc,c) = [(1 - a - 0)5 C> \ + aS c>3 + f35 cA ]p k for 
k > 3, where a, f3 € [0, 1]. In this way we create nonzero 
clustering by assigning a fraction a of fc-degree vertices 
to 3-cliques and a fraction j3 to 4-cliques. Additionally, 
since a 2-degree vertex cannot belong to a clique of size 
c > 3, we assign a fraction a of these vertices to 3-cliqucs 
using 7(2, c) = [(1 — a)8 Ct i + aS c ^]p 2 - We let vertices of 
degree zero or one belong to 1-cliques: j(k, c) = Pk& c ,i- 
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the results shown here in Fig. 3. Comparing these two fig- 
ures will reveal to the reader that they do indeed match. 
This illustrates that our approach contains within its 
scope the ability to produce the same predicted values 
of S as the theory of [15]. However, as noted earlier 
at the beginning of Sec. II, Gleeson's equations depend 
on a set of polynomial functions defined and tabulated 
in [36]. These polynomials limit the application of his 
equations to bond percolation. The advantage of our 
approach over that of [15] is its purported applicability 
to other processes besides bond percolation. To confirm 
that it really does possess this flexibility we consider for 
our second test Watts's model [4]. 



FIG. 3. (Color online) Bond percolation on 7(fc, c) graphs 
of N = 10 5 vertices and Poisson degree distribution pk with 
mean degree z = 3. Numerical simulations (symbols) aver- 
aged over 100 realizations and theory of Sec. II (lines) on a 
plot of GCC size S vs. bond occupation probability <f>t. 



This choice of ^(k, c) limits the largest clique size to 
c max = 4 and therefore makes the evaluation of R'^ 1 (q n ) 
relatively simple. By varying a and j3 different levels of 
clustering can be prescribed. Again following [15], we 
use three (a, (3) pairs: (0,0), (0.8,0.1), and (0,1). Evi- 
dently, (0, 0) produces a nonclustered graph (downward- 
pointing triangles). We can use Eq. (2) of [15] to define 
the global clustering coefficient C 2 = J2kPk c k- From this 
one may show that (0.8,0.1) produces a clustered graph 
with Ci = 0.31 (squares), and also that (0, 1) gives a 
graph with Ci = 0.35 (upward-pointing triangles). 

The percolation thresholds for each nonzero value of C2 
can be calculated from our cascade condition of Sec. II by 
setting -ff'(O) = 1 in Eq. (8) and solving for <pb (see [26]). 
This of course requires that we first substitute Eq. (21) 
into Eq. (8) . We also require the following results for the 
function G c ( f 2 {q) of Eq. (9) in order to evaluate i?^" 1 (0) 
and the first derivative of R c ^ 1 (q) at q = 0: 



cr 2 (o) = E 



I P k 



k 



Pc 



c+l){F k d+v 



Using Eqs. (21)-(23) in Eq. (8) we calculate the thresh- 
old for C 2 = 0.31 to be $ b = 0.349, while for C 2 = 0.35 
we get 4>b — 0.423. The threshold for C2 = is simply 
the configuration model value fa = 1/z [40] . 

The match obtained between theory and numerics in 
Fig. 3 provides a clear validation of our approach in the 
case of bond percolation. Furthermore, because we have 
chosen the same parameters as Fig. 3(a) of [15], the re- 
sults shown in that figure should correspond exactly with 



B. Watts's model 

Watts's model provides a simplified description of 
threshold-dependent cascade dynamics on complex net- 
works. In a sociological setting this model may provide 
a crude approximation of the processes of contagion that 
underlie such phenomena as fashions, rumours, or popu- 
lar opinions. Given, for example, a network of acquain- 
tanceships between a group of people, we can use Watts's 
model to calculate the steady-state fraction of active ver- 
tices in the following binary-state decision process. 

We begin by assigning a threshold T\ drawn from the 
probability distribution q(r) to each vertex 1 < i < N 
in the network. At each discrete time step t the state 
of vertex i is Vi(t) £ [0, 1], where Vi(t) = 1 indicates the 
participation of i in the cascade and i?j (t) — indicates 
nonparticipation. The dynamics is instigated by activat- 
ing a small seed fraction of vertices at t = 0. From t = 1 
until the steady-state t the state of each vertex is updated 
synchronously at each t according to the rule 



Vi{t) = \ 1 if hT. j a ij v j {t)>r i 

I unchanged otherwise, 



(24) 



where is the value in position (i,j) of the adjacency 
matrix of the network and ki is the degree of vertex i. 
(22) By this mechanism vertex i will join the cascade if the 
fraction of its direct neighbors that are active exceeds its 
threshold, otherwise it will remain inactive. Once active, 
i will remain in this state. 
, In the steady-state the final fraction of active vertices 

Fd )■ (23) j s gj ven by Vi(t). By averaging this last value over 

many individual runs of the model we can determine a 
numerical evaluation of the expected cascade size p. 

With the appropriate choice of response function 
F^ + p our Eqs. (3)-(5) provide an analytical match to 
the numerical results of Watts's model. In Fig. 4 we 
present values of p from Eq. (5) plotted against the re- 
sults of simulations on 7(fc,c) graphs. The thresholds in 
each of these graphs are drawn from a Gaussian distribu- 
tion: q(r) = N(R, 0.1) (see the caption). Therefore, the 
response function for our equations is defined by replac- 
ing to with to + j in Eq. (2) of [5]: 
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FIG. 4. (Color online) Watts's model on y(k, c) graphs of 
N = 10 6 vertices and Poisson degree distribution pu with 
mean degree z = 3. Thresholds are drawn from a Gaussian 
distribution with mean R and standard deviation a = 0.1. 
Numerical simulations (symbols) averaged over 100 realiza- 
tions and theory of Sec. II (lines) on a plot of cascade size 
p vs. R. In (a) w% = 1 and w e = 1. In (b) Wi = 1.3 and 

w e = 0.85. 



The response function for this process is given by mul- 
tiplying m by Wi and j by w e in Eq. (25). We propose 
that such a weighting may provide insights into the role 
of group structure in determining the outcome of pro- 
cesses of social contagion such as those mentioned above. 
Problems of this nature have been of interest for quite 
some time (see [35] and references therein). 

When Wi — w e = 1 we have the conventional version 
of Watts's model in which there is no bias in favor of 
either type of neighbor [Fig. 4(a)]. However, settings 
where Wi > w e or Wj, < w e indicate a respective bias 
in favor of or against one's clique neighbors over one's 
external neighbors. If we take the clique as a proxy for a 
tightly-knit social group, then the first setting describes 
a scenario where the influence of ones peers is favored 
over influences from outside the immediate peer group. 
The second setting describes the opposite scenario. 

Figure 4(b) demonstrates why this modification of 
Watts's model is interesting from an analytical perspec- 
tive. Here we have set Wi — 1.3 and w e — 0.85. Com- 
paring this figure to Fig. 4(a), we see that this simple 
change in weighting can cause a significant change in the 
expected cascade size p. In Fig. 4(a) the nonclustered 
graph produces a larger value of p than the clustered 
graph at every value of the threshold distribution mean 
R. However, in Fig. 4(b) this trend is reversed in the 
region from approximately R — 0.26 upward. Based on 
this observation, we submit that weighted models such as 
the one provided here may offer new insights into the ef- 
fects of clustering and decision bias in cascades on social 
networks [41]. We leave the analysis and modification of 
this weighted model open to further investigation. 



CONCLUSION 



F — 

va+j 



1 + erf 



(m + j)/k - R 



(25) 



The choice of q(r) = N(R, 0.1) means some vertices will 
be assigned negative thresholds and will therefore au- 
tomatically activate. This allows us to set po = in 
Eqs. (3)-(5). The structural variables used for this figure 
are the same as those applied previously in Fig. 3. All 
graphs have Poisson degree distribution pi~ with z = 3 
and 7(fc, c) is defined by the same three equations as 
above. We apply two {a, (3) pairs, (0,0) and (0.8,0.1), 
corresponding to C 2 = and C 2 = 0.31, respectively. 

Figure 4(a) provides a further validation of our ap- 
proach and explicitly demonstrates its flexibility. In 
Fig. 4(b) we investigate a minor modification to Watts's 
model. The presence of neighbors of two distinct kinds 
(internal and external) in clique-based graphs opens up 
some interesting possibilities for the augmentation of the 
updating process described by Eq. (24). To take one 
simple example, consider the following weighting scheme. 
Let each active internal vertex have weight w.i € (0, oo) 
and each active external vertex have weight w e € (0, oo). 



We have extended Gleeson and Cahalane's [27] analyt- 
ical approach to modeling cascading phenomena on con- 
figuration model graphs to the highly clustered clique- 
based graphs defined by Gleeson in [15]. An analytical 
expression for the expected cascade size and a first-order 
cascade condition have been derived. The use of the gen- 
eralized response function mechanism in these expres- 
sions permits their application to a range of processes 
that includes site and bond percolation, fc-core decom- 
position, and Watts's threshold model. 

We have validated our approach against numerical sim- 
ulations of bond percolation and Watts's model. In addi- 
tion, we have proposed a modification of Watts's model 
that employs the unique structure of clique-based graphs 
in an investigation of the role of group influence in pro- 
cesses of social contagion. This presents rich ground for 
further investigation. The analytical framework provided 
by us here may be useful for such studies. 

Perhaps the most significant aspect of our contribu- 
tion is the derivation of a closed-form expression for the 
steady-state fraction of active vertices inside a clique of 
arbitrary size. We anticipate that this expression will find 
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additional applications outside of the current setting. 

Finally, there are a number of significant challenges 
that we have yet to address in our broad study of cascades 
on clustered graphs. We have now provided approaches 
for a class of monotone binary-state processes on both 
edge-triangle graphs [26] and clique-based graphs; there 
are two directions in which we would like to extend this 
work. First, we would like to modify our techniques 
to investigate nonmonotone processes. The groundwork 
for this has been laid in [42]. Second, we would like 
to investigate cascades on a more sophisticated class of 
highly clustered graphs than those dealt with so far. Such 
classes have been described in [43]. 
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Appendix: Clusters in damaged graphs 

The results illustrated in Fig. 3 demonstrate the equiv- 
alence of the approach to bond percolation provided in 
[15] and the corresponding approach provided here. By 
working through the equations of [15] and those of this 
paper, one may show that the match between the two 
approaches hinges on the following equation: 



where x = 1 — Gq and q = 1 - 



On the left-hand side 



E 

m— 1 



P{m\c)x r 



c-l 

Ejjc—lm 
K m q , 

rn— 



(A.l) 



of Eq. (A.l) P{m\c) is the probability that in a c-clique 
that has been damaged by the removal of its edges (each 
with independent probability q) a connected cluster of m 
vertices (not necessarily an m-clique) remains. 

In [36] the probability P(m\c) was evaluated itera- 
tively using a recursive formula; an explicit formula for 
P(m\c) was not provided. By making use of Eq. (A.l) 
we can now write an explicit formula for P(m\c). 

Applying Eq. (20) allows us to expand the right-hand 
side of Eq. (A.l) and thereby rewrite it as 



J2 Pim^x" 1 - 1 



E 

|i|<c-l 



C- 1 
h 



i=2 



j=0 VJ/ 



(A.2) 



To equate coefficients of powers of of x on the left- 
hand side and right-hand side of Eq. (A.2) we simply set 
j = m — c + l±. This gives us the following expression for 
the probability P(m\c): 



P(m\c)= ]T 



c-l 



h 



q h l[B?;-^(6 c _ lti ) q l < 

|i|<c-l v ~* ' i=2 
h 



I , , It" 1 ) 

. m — c + i\ 



m — c+Zi 



(A.3) 



One may easily verify that Eq. (A.3) satisfies the normal- 
ization condition X) m =i P( m \ c ) = 1- 
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